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Abstract 

In this paper, we study the problem of constructing perfect phylogenies for three-state characters. Our work builds on 
two recent results. The first result states that for three-state characters, the local condition of examining all subsets of 
three characters is sufficient to determine the global property of admitting a perfect phylogeny. The second result 
applies tools from minimal triangulation theory to the partition intersection graph to determine if a perfect phylogeny 
exists. Despite the wealth of combinatorial tools and algorithms stemming from the chordal graph and minimal 
triangulation literature, it is unclear how to use such approaches to efficiently construct a perfect phylogeny for 
three-state characters when the data admits one. We utilize structural properties of both the partition intersection 
graph and the original data in order to achieve a competitive time bound. 
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Background 

In this paper, we study the problem of constructing phy- 
logenies, or evolutionary trees, to describe ancestral rela- 
tionships between a set of observed taxa. Each taxon is 
represented by a sequence and the evolutionary tree pro- 
vides an explanation of branching patterns of mutation 
events transforming one sequence into another. 

We will focus on the widely studied infinite sites model 
from population genetics, in which the mutation of any 
character can occur at most once in the phylogeny. With- 
out recombination, the phylogeny is a tree called a perfect 
phylogeny. The problem of determining if a set of binary 
sequences fits the infinite sites model without recom- 
bination corresponds to determining if the data can be 
derived on a perfect phylogeny. A generalization of the 
infinite sites model is the infinite alleles model, in which 
any character can mutate multiple times but each muta- 
tion of the character must lead to a distinct allele (state). 
Again, without recombination, the phylogeny is tree, 
called a multi-state perfect phylogeny. Correspondingly, 
the problem of determining if multi-state data fits the 
infinite -alleles model without recombination corresponds 
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to determining if the data can be derived on a multi-state 
perfect phylogeny. 

Dress and Steel [1] and Kannan and Warnow [2] both 
give algorithms that construct perfect phylogenies for 
three-state characters when one exists. The goal of this 
work is to extend the results in [3] using the minimal 
separators of the partition intersection graph to create a 
three state construction algorithm that is competitive with 
Dress and Steels algorithm. 

Notation and prior results 

The input to our problem is a set of n taxa defined over 
a set of m characters C = {x^, • • • > X^}- We denote 
the states of character by for 0 < ; < r — 1. 
A species is any sequence s = 5i,52, with 5/ G 

{Xo> Xp • • • > Xr-i) ^ {*} for / = 1, 2, . . . , m. The * denotes 
a missing value, x ^ can also be considered as a function 
mapping species to character states, writing x\s) = ^t- In 
this paper, every taxon is a species without missing values 
(C is also called a set of full characters in the literature). 
We will consider the set of taxa as an « x m matrix M, 
where each row corresponds to a taxon and each column 
corresponds to a character (or site). 

The perfect phylogeny problem is to determine whether 
the taxa defined by a matrix M can be displayed on a tree 
T such that 
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1. each taxon of M labels exactly one node in T, 

2. each leaf in T is labeled by a taxon of M, 

3. each node of T is labeled by a species, 

4. for every character x ^ and for every state Xj of 
character x \ the set of all nodes in T labeled by 
species whose state of character x^ is Xj forms a 
connected subtree of T. 

Any tree satisfying conditions 1 - 4 is called a perfect 
phylogeny for M. Any character satisfying condition 4 is 
said to be compatible with T. The general perfect phy- 
logeny problem (with no constraints on r, n, and m) is 
NP-complete [4,5]. However, the perfect phylogeny prob- 
lem becomes polynomially solvable (in n and m) when r is 
fixed. For r = 2, this follows from the Splits Equivalence 
Theorem [6,7]. For r = 3, Dress and Steel gave an 0{nnP') 
algorithm [1] and for r = 3 or 4, Kannan and Warnow 
gave an 0{rP'm) algorithm [2]. For any fixed r, Agarwala 
and Fernandez-Baca gave an 0(2^^ (nrn^ + m^)) algorithm 
[8], which was improved to 0(2?^ nnP') by Kannan and 
Warnow [9]. 

Definition 2 A* [7,10] For a set of input taxa M, the par- 
tition intersection graph G(M) is obtained by associating 
a vertex for each character state and an edge between two 
vertices xj cind x\ if there exists a taxon s with x\s) = Xj 
andx^is) = xf. 

Note that by definition, there are no edges in the parti- 
tion intersection graph between states of the same charac- 
ter. It will be useful to consider the partition intersection 
graph G(xSx^,x^) of the submatrix of M defined by the 
three characters xS X^y X^- 

Definition 2.2. A graph H is chordal, or triangulated, 
if there are no induced chordless cycles of length four or 
greater in H, 

See [11] and [12] for further details on chordal graphs. 

Consider coloring the vertices of the partition intersec- 
tion graph G{M) by colors 1, 2, . . . , m as follows. For each 
character x^ assign color / to the vertices Xo> Xi> • • • > Xr-v 
A pair of distinct vertices u,v of G(M) with the same 
color is called a monochromatic pair, A proper triangula- 
tion of the partition intersection graph G{M) is a chordal 
supergraph of G(M) such that every edge has endpoints 
with different colors. In [10], Buneman established the fol- 
lowing fundamental connection between the perfect phy- 
logeny problem and triangulations of the corresponding 
partition intersection graph. 

Theorem 2.3. [7,10] A set of taxa M admits a per- 
fect phylogeny if and only if the corresponding partition 
intersection graph G{M) has a proper triangulation. 



A triangulation of a graph G is minimal if it does 
not have a proper subgraph that is also a triangula- 
tion of G. Theorem 2.3 can be restated in terms of 
proper minimal triangulations of G(M) because remov- 
ing edges from a proper triangulation will preserve the 
coloring of the graph. If G(M) has a proper triangulation 
//, then a perfect phylogeny for M can be constructed 
from a clique tree of H. T is s. clique tree for a graph 
Gif 

1. the nodes of T are in bijection with the maximal 
cliques of G, 

2. for each vertex v of G, the maximal cliques 
containing v form a connected subtree of T. 

That is, given a clique tree T for a proper triangula- 
tion H of G(M), we label each node by its corresponding 
maximal clique. Because H is properly colored, this max- 
imal clique includes at most one state per character and 
therefore defines a species. Each taxon t defines a clique 
Kt of size m in G(M), and because // is a triangulation 
of G(M), Kt is a clique in H as well. Furthermore, // is a 
proper triangulation, so Kt is a maximal clique of H. For 
a clique tree T, we label the node corresponding to Kt by 
t to obtain a perfect phylogeny for M. Conversely, if M 
has a perfect phylogeny T, then the species in T define 
a set of additional edges to obtain a proper triangulation 
for G(Af). This is due to the following characterization 
of chordal graphs by the intersections of subtrees of a 
tree. 

Theorem 2.4. [10,13] G is a chordal graph if and only 
if there is a tree T such that each vertex uofG induces a 
subtree Tu ofT and uv is an edge ofG if and only if subtrees 
Tu and Ty share at least one node. 

In particular, if a pair of character states appear in the 
same species of a perfect phylogeny for M but not in any 
input taxon of M, this pair defines a fill edge to add to 
obtain a proper triangulation of the partition intersec- 
tion graph. This fill edge preserves the proper coloring 
because intersecting subtrees from the same character 
would contradict conditions 3 and 4 of the perfect phy- 
logeny definition. 

To illustrate some of these notions, consider the exam- 
ple in Figure 1. The species with sequence 2100 defines a 
fill edge X j Xo which is not an edge of G{M) (this is the only 
such fill edge). Nevertheless G{M) itself is chordal, and 
adding this fill edge would result in a proper triangulation 
that is not minimal. 

In recent work, it is shown that there is a complete 
description of minimal obstruction sets for three-state 
characters analogous to a well-known result on obstruc- 
tion sets for binary characters (the four gamete condition) 
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[3]. These results allow us to expand upon recent work 
of Gusfield [14] which uses properties of triangulations 
and minimal separators of partition intersection graphs 
to solve several problems related to multi-state perfect 
phylogenies. 

An (a,b)-separator of a graph G is a set of vertices 
whose removal from G separates a and b, A minimal (a,b)- 
separator is an (^^^^Z?)- separator such that no proper subset 
is an -separator, and a minimal separator is a sepa- 
rator that is a minimal (^^^^Z?)- separator for some pair of 
vertices a and b. For a set of vertices X, let G-X be the 
induced subgraph of G after removing vertices X If S and 
S' are two minimal separators of G, we say S is parallel to 
S' if there is a single connected component C oi G — S' 
such that S c C U V (otherwise S and S' cross), A pair 
of vertices a and b cross 5' if 5' is an (^^^^Z?)- separator. The 
neighborhood of a set of vertices X is N{X) = {v g G — X : 
{u, v) G E{G) for some u g X], A component C of G-S is 
/m// if the neighborhood N{C) is equal to S, The following 
characterization of minimal separators is critical to our 
arguments. 

Lemma 2.5. [15] Let S be a subset of vertices of graph G. 
Then S is a minimal separator ofG if and only if G-S has 
two or more full components. 

In a colored graph, a legal separator is a separator such 
that no two vertices have the same color. Let Ag denote 
the minimal separators of graph G. For S e Ag, we sat- 
urate S by adding edges between every pair of vertices 
in S to create a clique. For Q ^ Ag, Gq denotes the 
graph obtained by saturating every S e Q, The following 
theorem shows the connection between minimal triangu- 



lations and collections of parallel minimal separators of a 
graph. 

Theorem 2.6. (Minimal Triangulation Theorem [16-18]). 
Suppose Q c Ag is a maximal set of pair wise parallel 
minimal separators of G, Then Gq is a minimal triangu- 
lation of G and Agq = Q. Conversely, if H is a minimal 
triangulation of G, then Ah is a maximal pairwise 
parallel set of minimal separators ofG, 

The following are necessary and sufficient conditions for 
the existence of a perfect phylogeny for data over arbi- 
trary number of states. We refer the reader to [14] for the 
proofs. 

Theorem 2.7. (Theorem 2 (MSP) [14]). For input M 
over r states (r > 2), there is a perfect phylogeny for M 
if and only if there is a set Q of pairwise parallel legal 
minimal separators in G(M) such that every illegal mini- 
mal separator in G(M) is crossed by at least one separator 
in Q, 

Theorem 2.8. (Theorem 3 (MSPN) [14]). For input M 
over r states (r > 2), there is a perfect phylogeny for M 
if and only if there is a set Q of pairwise parallel legal 
minimal separators in partition intersection graph G(M) 
such that every monochromatic pair of nodes in G(M) is 
separated by some separator in Q, 

For the special case of input M with characters over 
three states (r = 3), the partition intersection graph satis- 
fies additional structure and the following theorems give 
necessary and sufficient conditions for the existence of a 
perfect phylogeny for M [3] . 
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Theorem 2.9. [3] Given an input set M with at most 
three states per character (r < 3), M admits a perfect phy- 
logeny if and only if every subset of three characters of M 
admits a perfect phytogeny. 

Furthermore, there is an explicit description of all 
minimal obstruction sets to the existence of a perfect 
phylogeny. 

Theorem 2.10. [3] For input M over 3-state characters, 
there exists a perfect phylogeny for M if and only if both of 
the following conditions hold: 

1. for every pair of columns ofM, the partition 
intersection graph induced by the columns is acyclic 
and 

2. for every triple of columns of M, the partition 
intersection graphs induced by the columns does not 
contain any of the graphs shown in Figure 2 up to 
relabeling of the character states. 

This complete characterization of minimal obstruction 
sets allows us to simplify Theorem 2.8 in the case r = 3, 

Theorem 2.11. [3] For input M on at most three states 
per character (r < 3), there is a three-state perfect phy- 
logeny for M if and only if the partition intersection graph 
for every pair of characters is acyclic and every monochro- 
matic pair of vertices in G(M) is separated by a legal 
minimal separator. 

Theorem 2.11 shows that the requirement of Theorem 
MSPN that the legal minimal separators in Q be pairwise 
parallel can be removed for the case of input data over 



three-state characters. The condition in Theorem 2.11 
that the input is over three state characters is necessary, 
as there are examples showing that the theorem does not 
extend to data with four- state characters. 

All of the legal minimal separators for three-state input 
can be found in 0(nm^) time and the algorithm to check if 
each monochromatic pair is separated by a legal minimal 
separator can be performed during the algorithm for gen- 
erating the legal minimal separators (see Section "Proper 
triangulation algorithm"). Therefore, the 3-state perfect 
phylogeny decision problem can be solved in 0(nm^) time 
using minimal separators. However, it is not clear how 
minimal separators can be used to solve the construction 
problem in a similar time bound. In [14], Gusfield used the 
minimal separator approach and integer linear program- 
ming methods to solve both the decision and construction 
problem for /c-state perfect phylogeny. Since integer linear 
programming methods in general do not have polynomial 
time bounds, this naturally leads to the following ques- 
tion: is there an 0(nm^) algorithm for the construction 
problem for 3-state perfect phylogeny using the separator 
approach? In this paper, we answer in the affirmative, and 
show that any algorithm which explicitly computes the 
partition intersection graph has a time bound of at least 
0{nm + m^). 

We first study the structure of separators in the par- 
tition intersection graph for 3-state input with the goal 
of answering this question. We first state two lemmas 
from [3]. 

Lemma 2.12. (Lemma 3.4 [3]). Let M be a set of input 
taxa with at most three states per character, and con- 
sider any three characters x^ If the partition 
intersection graph G(x^ X^, X^) is properly triangulatable, 
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Figure 2 Minimal obstruction sets. Minimal obstruction sets for three-state characters up to relabeling. The boxes highlight the input entries that 


are identical for three of the obstruction sets. 
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then the only possible chordless cycles in G(x^ X^> X^) 
chordless 4'Cycles, with two colors appearing once and the 
remaining color appearing twice. 

Lemma 2.12 implies that if a subset of three characters 
X^ iri ^ is properly triangulatable, then there is a 

unique set of edges x^ X^) that must be added to tri- 
angulate the chordless cycles in G(xS x^, x^). Construct 
a new graph G'(M) on the same vertices as G{M) with 
edge set£(G(M)) U (Ui</<;<^<^ ^(x^ X^,X^))- G\M) is 
the partition intersection graph G{M) together with addi- 
tional edges to properly triangulate all chordless cycles in 
each G(xS X^ X^) for 1 < i < j < k < m (note these 
are the chordless 4-cycles of G{M) on three colors). In 
&(M), edges from the partition intersection graph G{M) 
are called E-edges and edges that have been added as tri- 
angulation edges for some triple of columns are called 
F-edges. 

Lemma 2.13. (Lemmas 4.2, 4.3, 4.7 [3]) Let M be a set 
of input taxa with at most three states per character, and 
suppose G(M) is properly triangulatable. Then G'{M) can- 
not contain a chordless cycle with one or more F-edges. IfC 
is a chordless cycle in G\M) with only E-edges, then C has 
length exactly four with four distinct colors. 

Structure of separators 

In this section, our goal is to study the relationship 
between minimal separators in G{M) and G{M) whenM 
is a set of taxa over 3-state characters. Our ultimate goal is 
to show that it suffices to consider only the legal minimal 
separators of G{M) while disregarding the illegal minimal 
separators. We first prove the following theorem on the 
separator structure of G\M). 

Theorem 3.1. LetM be a set of taxa over 3-state charac- 
ters. M allows a perfect phylogeny if and only ifG'{M) ( the 
partition intersection graph G(M) together with F-edges) 
does not contain any illegal minimal separators. 

Proof. Suppose M allows a perfect phylogeny and sup- 
pose there is an illegal minimal separator S in G'{M) with 
a monochromatic pair of vertices u and v. By Lemma 2.5, 
there exist two full components C, D of G - S, and by def- 
inition of a full component, there are paths connecting u 
and V in both C U {w, v} and DU {u,v}. Consider the short- 
est such paths Pc and Pd respectively (note that there are 
no chords within Pc and no chords within Pd). Since C 
and D are components separated by S, there are no edges 
between C and D. Also, u and v are not adjacent in G\M) 
since u and v have the same color and G\M) contains no 
illegal edges. This implies the union of Pc and Pj) creates 
a chordless cycle. By Lemma 2.13, G\M) cannot contain 
any chordless cycles of length five or greater or chordless 



cycles with F-edges, so the union of the paths Pc and Pd 
must be a four cycle C and in particular, must be a cycle 
u^x^v^x^^Uf where u and v have the same color. 
C is a chordless four cycle in G{M) on at most three colors, 
which cannot occur since we have triangulated all such 
cycles by F-edges. This contradiction implies S cannot be 
an illegal minimal separator. 

Now, suppose G\M) does not contain any illegal mini- 
mal separators. By Theorem 2.7, graph &(M) has a proper 
triangulation and since G{M) is a subgraph of G\M), 
G(M) also has a proper triangulation. It follows that M has 
a perfect phylogeny. □ 

This suggests that analyzing the minimal separators of 
G\M) suffices for 3-state construction. However, the algo- 
rithm for enumerating the minimal separators of G{M) 
necessary for proper triangulations in 0(nm^) time uses 
M (rather than G{M)), and it is not clear if it is possible 
to extend this approach to enumerate the necessary min- 
imal separators of G\M). In order to use techniques in 
[14], the the goal of our next two results will be to describe 
the relationship between the minimal separators of G\M) 
and the legal minimal separators of G{M) when M has a 
perfect phylogeny. 

Lemma 3.2. LetM be a set of taxa over 3-state charac- 
ters allowing a perfect phylogeny. Then H is a proper min- 
imal triangulation of G(M) if and only if H is a minimal 
triangulation of G^ (M). 

Proof. Suppose // is a proper minimal triangulation of 
G{M). Each F-edge of G'{M) comes from a chordless cycle 
of length four on three colors (see Lemma 2.12), so this 
edge must appear in any proper triangulation of G{M). 
Hence the F-edges must be edges of //, so G\M) c H 
and // is a proper triangulation of G'{M). If H is not 
minimal with respect to G'{M)y there exists H' such that 
a{M) ^H' cH and thus G(M) ^ C H, contradict- 
ing the minimality of H with respect to G{M). Thus /f is a 
minimal triangulation of G\M). 

Conversely, suppose M allows a perfect phylogeny and 
// is a minimal triangulation of G\M). By Theorem 2.6, 
H = G\M)q for a set Q of maximal pairwise parallel 
minimal separators of G^(M), and these minimal separa- 
tors must be legal by Theorem 3.1. Every edge in H not in 
G{M) is either an F-edge of G\M) or a fill edge defined 
by Q, and in both cases it must be a legal fill edge. There- 
fore /f is a proper triangulation of G{M). If there is some 
proper triangulation of G{M) where G(M) ^ H' ^ H 
then the F-edges of G\M) must be edges of H', other- 
wise has a chordless four cycle. Thus is a proper 
triangulation of G'(M), and because // is a proper minimal 
triangulation of G\M) it must be that = H. Therefore 
H is also a proper minimal triangulation of G{M). □ 
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Let A^^^ denote the set of legal minimal separators of 
G(M). 

Theorem 33* Suppose M is a set oftaxa on 3-state char- 
acters that allows a perfect phylogeny. Then the legal mini- 
mal separators ofG(M) are exactly the minimal separators 
ofG\M) (i,e„ Ag/(m) = ^g{M))' 

Proof. Assume M has a perfect phylogeny. Consider a 
minimal separator S of G'(M), and suppose Q is a set of 
maximal pairwise parallel minimal separators of &(M) 
with S e Q. Let H = G\M)q. // is a minimal triangula- 
tion of G\M) by Theorem 2.6, and /f is a proper minimal 
triangulation of G{M) by Lemma 3.2. By Theorem 2.6, 
Q is precisely the set of minimal separators of //. Fur- 
thermore, because H is also a minimal triangulation of 
G(M), the same theorem states that Q is a subset of the 
minimal separators of G{M). Therefore S e Ag(M)> so 
Ag'(M) ^ AG(7vr)- Each minimal separator of &(M) is 
legal by Theorem 3.1. Hence Ag'(m) ^ ^g(M)' 



Conversely, let S e A 



G(M)' 



First we show that if 



no F-edge / of G^M) crosses S (i.e. f = xy where 
S separates x and j), then 5 is a minimal separa- 
tor of G\M). Let C be a connected component of 
G(M) — S. C is still connected in &(M), and because 
no F-edge of G\M) crosses S, Nq'(^m){C) ^ Hence 
C is a connected component of G\M) — S. Fur- 
ther, we have only added edges to obtain G\M), so 
^G(M)(C) c Nq'(^m)(C)^ Therefore if C is a full compo- 
nent of G(M) — 5 we have Ng(m)(C) = A^g^(m)(Q = S, 
and it is also a full component of &(M) —S. By Lemma 2.5, 
5 is a minimal separator of G\M). 

Now consider a minimal separator of G{M). If an F- 
edge/ = xy crosses S\ there is a four cycle x ^ u ^ 
y ^ V ^ X in G{M) with monochromatic pair w,v, and 
further, u,v G S\ Hence is illegal, and any legal mini- 
mal separator of G{M) is not crossed by any F-edge. From 
our previous argument, this implies A^^^^ c Ag'(m)- 
Therefore Ag/(m) = A^^^^. 



□ 



The second half of the proof of Theorem 3.3 proves the 
following. 

Corollary 3A* Suppose M is a set of taxa on S-state 
characters that allows a perfect phylogeny. If S e A^,^^ 
then C is a connected component ofG{M) — S if and only 
ifC is a connected component ofG\M) — S, 

We now prove the main result of this section. 



Theorem 3,5, Suppose M is a set oftaxa on 3-state char- 
acters. Then M has a perfect phylogeny if and only if any 
maximal pairwise parallel set of legal minimal separators 



Q ofG(M) induces a proper minimal triangulation G{M)q 
ofG(M), 

Proof, First, suppose that M has a perfect phylogeny, and 
let Q be a maximal pairwise parallel set of legal minimal 
separators of G(M). We show that G(M)q is a proper tri- 
angulation of G{M), By Theorem 3.3, Q is a maximal set of 
minimal separators of G^(M), and they are pairwise par- 
allel because the connected components of each minimal 
separator in Q are the same in G{M) and G'{M) (Corol- 
lary 3.4). Hence H = G\M)q is a minimal triangulation 
of G\M) with minimal separator set Q (Theorem 2.6), 
and by Lemma 3.2, // is a proper minimal triangulation of 
G(M), Because Ah = Q> Theorem 2.6 implies Q is a max- 
imal pairwise parallel set of minimal separators of G{M) 
and therefore H = G(M)q, Thus H = G(M)q is a proper 
minimal triangulation of G{M), 

For the converse, pick any maximal pairwise parallel 
set of legal minimal separators Q of G{M) that induces a 
proper minimal triangulation G(M)q of G(M), Then M 
has a perfect phylogeny by Theorem 2.3. □ 

Proper triangulation algorithm 

In this section, we build on techniques developed in [14] 
to generate the minimal separators of G\M) and their par- 
allel relations in 0(nm^) time. This will allow us to use a 
greedy approach to pick a maximal pairwise parallel set of 
legal minimal separators. These minimal separators will 
then define a set of fill edges for a proper minimal triangu- 
lation, and a perfect phylogeny will be constructed in the 
form of a clique tree using Maximum Cardinality Search 
(MCS). 

Lemma 4.1. [14] Let Q be a set of maximal pairwise 
parallel legal minimal separators of a partition intersec- 
tion graph G(M), Then for each S e Q, \S\ < m. 

Define A^^^^ = {S e A^^^^ : \S\ < m). We first state 
our algorithm and then analyze the running time of each 
step. 

Algorithm: proper triangulation for 3-state characters 

1. Stop if there is a pair of characters whose partition 
intersection graph contains a cycle. 

2. Compute A^^^^ using proper clusters. 

3. Stop if there is a monochromatic pair not separated 
by any legal minimal separator. 

4. Compute the crossing relations for A^^^^^. 

5. Greedily construct a maximal pairwise parallel subset 
Q of A^^^^^; stop if Q has more than 2« — 3 minimal 
separators. 

6. Add edges to G (M) to make each S G Q a clique. 
Call this graph Gq. 

7. Use MCS to construct a clique tree for Gq. 



Gysel etal. Algorithms for Molecular Biology 201 2, 7:26 
http://www.alnnob.Org/content/7/l/26 



Page 7 of 1 1 



We proceed with a series of lemmas that will be used 
in Theorem 4,11 to show that each step is 0(nm^), The 
following simple observation is important for many of our 
time bounds. 

Observation 4.2. Let M be a set oftaxa whose charac- 
ters have at most three states. Then G(M) has 0(m) vertices 
(one vertex per state of each character) and OinP') edges. 

Step 2 of the algorithm uses concepts from [2,8,9,14], 
which we detail here for completeness. A proper cluster 
is a bipartition of the taxa (i.e. the taxa are split into two 
disjoint nonempty sets) such that each character shares at 
most one state across the bipartition, and at least one char- 
acter is not shared across this bipartition [8,9]. There are 
0{m) proper clusters when r is fixed. In particular, sup- 
pose X is not shared across the bipartition of a proper 
cluster. Then the proper cluster also creates a bipartition 
of X s character states (see Figure 3). Hence, we can com- 
pute the set of proper clusters by exhaustively checking, 
for each character, if some bipartition of its states split the 
taxa into a proper cluster (there are 0(2^) ways to split 
each character). 

Proper clusters generate the minimal separators in 
^G(M) follows [14]. For a connected component C of 
G{M) — S, let t{C) be the set of taxa with character- 
state Xj for at least one xj ^ ^- We will refer to the 
set of t{C) determined by the connected components of 
G{M) — 5 as the ^-partition of the taxa. Recall S has at 



most m — 1 vertices by Lemma 4.1, so every taxon must 
have a character- state that is not a vertex of S. Hence 
no taxon can have all of its character-states as vertices 
of S, Additionally, each taxon defines a clique, so it can- 
not have vertices in more than one connected component 
of G{M) — S (this would define an edge between con- 
nected components). By Lemma 2.5, G(M) — S has two or 
more full components Q and C2. Place t(Ci) and t(C2) 
in separate parts of the bipartition, then for the remaining 
connected components C of G{M) — S add t{C) to either 
part. This defines a bipartition where the shared charac- 
ter states (known as the splitting vector [9]) are exactly 
the vertices of S, To see this, suppose a character-state Xj 
is a vertex of S, Because Q is a full component, there is 
a vertex Xjq ^ Q adjacent to xj- Because these vertices 

are adjacent, Xjq xj appear in the same row of M, 
which in turn is a taxon ti of t{Ci), Similarly, there exists 
t2 G t(C2) such that Xi(h) = j> so xj is shared in the 
bipartition. See Figure 3 for an illustration of these con- 
cepts. This implies that |A^^^^| = 0(m). The following 
two lemmas are special cases of those found in [14]. 

Lemma 4.3. [14] For any set oftaxa M on 3-state char- 
acters, A^^^^ can he computed in 0(nm^) time. Further, 

Proof. Our previous discussion proves that ^^Qi^^^ has at 
most 0{m) minimal separators, so we focus on the run- 
ning time. Let ^ be a proper cluster with splitting vector x 
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Figure 3 Minimal separators and proper clusters. In this figure, the bipartition ob\cdef gives rise to the proper clusters ob and cdef. The shared 
character states X], Xo< Xq fot"^ a legal minimal separator 5 in G(M). G(M) - S has three connected components, of which two are full (components 
C] and C2). The S-partition gives rise to the bipartition because t(Ci ) = {0, b} and t(C2) U tCCs) = {c, d, e, f}. T is a clique tree for G{M) (in this case, 
G{M) happens to be chordal). T is obtained from T by resolving the nodes labeled b,c,f. Note that S is represented in T on edge be because 
{Xq < X] < Xq < Xq} ^ {Xq < xf < Xq < Xq) = {x^Xq/Xq}- For a clique tree T of a chordal graph, every minimal separator of the chordal graph behaves this 
way [1 1,21]. In this sense, legal minimal separators are analagous to splitting vectors. 
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and let Sx be the vertices of G{M) appearing as character- 
states in X. Define the equivalence relation g/x by the 
transitive closure of the relation tRf if and only if there 
is a character where = X^(^0 = J and Xj is not 

a shared character state in x; calculating g/x takes 0{nm) 
time [9]. Given an equivalence class [t] oig/Xy the vertices 
{Xy ^ Sx I x\t') = / for some t' ^[t]} are a connected 
component of G{M)—Sx, and every connected component 
can be described in this way. For a connected compo- 
nent C of G{M) — Sx, the size of its neighborhood can 
be calculated using the t{C) rows of M (i.e. for t g t(C), 
count the character states of [t] also in x, being careful 
not to overcount). Sx ^ ^g(M) ^^^Y there are 

distinct equivalence classes [t] and [ t^] that share all char- 
acter states in x. For each equivalence class, we examine 
each taxon once, so this requires a single pass through 
every row of M and can be done in 0{nm) time per proper 
cluster, so step 2 takes 0{nryp') time. □ 

In the proof of Lemma 4.3, we showed how to com- 
pute the 5-partition of the taxa for S € A^^^^ in 0{nm) 
time. It is now easy to calculate the connected compo- 
nents of G{M) —S\iit{C) is part of the 5- partition, then C 
is obtained by listing the character- states that appear in at 
least one t e t(C) but not in S. This proves the following. 

Lemma 4,4. [14] LetM be a set ofS-state taxa and S e 
^GiM)' ^^^^^ 0(nm) algorithm that calculates the 
connected components ofG(M) — S and determines which 
of these connected components is full. 

Before discussing the running time required to compute 
crossing relations, we first state two structural lemmas 
on minimal separators; the second follows from a lemma 
in [19]. 

Lemma 4.5, [18] Let S and S^ be non-parallel minimal 
separators. Then for each full component C ofG — S', S has 
a vertex in C. 

Lemma 4.6. (Lemma 3.10, [19]). Let S and S^ be two 
minimal separators of a graph G, Then S and S' are par- 
allel if and only if there exists a full component Cs of 
G — S and a connected component Cs' ofG — S' such that 
Cs c Q/. 

Because of the slight change from Lemma 3.10 in [19] 
and for completeness, we give a proof of Lemma 4.6. 

Proof Suppose S and S' are parallel. Since 5 is a minimal 
separator, there are at least two full components in G — 5 
and because S' is parallel to 5, there is a full component 
Ci of G — 5 that does not intersect S' . Ci is connected in 



G — so there is a connected component C of G — 5' 
containing Ci. 

Now, suppose there are Cs and satisfying the condi- 
tions of the lemma. Then S c N{Cs) c U N{Cs') c 
Cs' U S'y implying that 5 and S' are parallel. □ 

Lemma 4.7. There is an 0(nm^) algorithm to calculate 
the crossing relations ofA^^j^y 

Proof Let 5, S^ e A^^^^. We begin by showing that S 
and y are parallel if and only if there is a full component 
C of G(M) — S and connected component of G(M) — S' 
such that t{C) c t{C') (i.e. t{C) is contained in a single 
part of the partition). Suppose S and S' are parallel. 
From Lemma 4.6, there are connected components C of 
G{M) - S and C of G(M) - S' such that C c C and 
consequently t(C) c t(C), 

Conversely, assume that S and S^ are not parallel. Let Ci 
be a full component of G(M) — S and C2 be a full com- 
ponent of G(M) — S\ By Lemma 4.5, there is a vertex 
V G Ci ny, and because C2 is full, there is a w g C2 r\N(v). 
The taxa form an edge clique cover for G(M), so there 
is a taxon t having both character states corresponding 
to u and v. Note v g Ci so ^ g t(Ci) and w g C2 so 
t G ^(C2). y has at least two full components, and repeat- 
ing this argument yields another full component C2 7^ C2 
of G(M) - S' such that t(Ci) n tiC^) ^ 0. Thus t(Ci) 
shares at least one taxon with at least two parts of the 
partition, so t(Ci) is not contained within any single 
part of the partition. This proves our characterization 
of parallel minimal separators of A^^^. 

It suffices to check for each full component C of G{M) — 
S and connected component C of G(M) — S^ if t(C) c 
t(C^), There are 0{m^) pairs of legal minimal separators, 
and this check takes 0{n) time {0{nm^) time overall) 
when the <S-partition has been calculated for each S G 

A* n 

It is critical for our time bound that any proper mini- 
mal triangulation of G{M) have 0{n) minimal separators 
because this impacts the computation of edges contained 
in the proper minimal triangulation. A result bounding 
the number of minimal separators in an earlier version of 
this paper (Lemma 7 in [20]) was incorrect, as demon- 
strated in Figure 4. We present a corrected bound for the 
number of minimal separators in the following Lemma. 

Lemma 4.8. Suppose that H is a proper minimal trian- 
gulation of G(M), Then H has at most 2n — 3 minimal 
separators. 

Proof, Let T be a clique tree of H, Recall that the nodes 
of T are in bijection with the maximal cliques of H, To 
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Figure 4 Bounding minimal separators of proper minimal triangulations. An example of the bound in Lemma 4.8. H has five minimal 
separators, obtained from every pair of vertices from the set [x] , x\> Xo / Xo i except {x\ , Xq }• There are four taxa, so Lemma 4.8 gives an upper 
bound of five minimal separators. Therefore, the bound in Lemma 4.8 is tight for n = 4 taxa. 



make this correspondence explicit, for each node x oiT 
we will write Kx to mean the maximal clique of H that 
corresponds to x, A classic result in chordal graph theory 
says that li S e Ah> there is an edge xy of T such that 
S = Kx r\ Ky [11,21]. Therefore the number of minimal 
separators in H is at most the number of edges of T. 

First, consider any leaf a of T. We claim that Ka contains 
a vertex of G that is not in any other maximal clique of 
G (this fact is well known in the chordal graph literature 
[22], but we prove it here for completeness). Suppose a' is 
the neighbor of a in T. By maximality, Ka 9- Ka' so there 
is a vertex v oiH that is contained in Ka but not contained 
in Ka'> If V is contained in a maximal clique of G that is not 
Kay then the second property of clique trees implies that 
V G Ka' as well. Hence v is only contained in Ka, proving 
the claim. Further, v is some character- state > and there 
is a taxon t of M such that xHO = /. Taxon t can only label 
a because no other node of T corresponds to a maximal 
clique that contains ♦ Thus for each leaf of T there is a 
unique taxon that labels it. 

To complete the proof, we show a similar result for inter- 
nal nodes of T with degree two. Let z be such a node with 
neighbors Zi and Z2^l^z contains a vertex that is only con- 
tained in z^s maximal clique Kz, our previous argument 
shows that z can be labeled by a unique taxon. Suppose 
this is not the case. Let Si = Kz H /<Q. for / = 1, 2. It must 
be that Kz = SiVJ S2 because we are considering the case 
when Kz does not contain a unique vertex. Further, we 
cannot have c ^'2 since otherwise Kz = S2 ^ Kz2 would 
not be maximal. Similarly, ^'2 2 Pick ui e Si — S2 and 
U2 G S2 — Sly noting that ui ^ Kz2 and U2 ^ Kz^^ We 
argue that Kz is the only maximal clique containing both 
ui and U2^ This is because if any other maximal clique K 
contains both vertices, then either Kz^ or Kz2 is on the path 



from Kz to K in T {K has degree two) and by the second 
property of clique trees, this maximal clique also contains 
both vertices. Further, because each S G Ah is of the form 
S = KxH Ky for an edge xy of T, there is no minimal sep- 
arator of H containing both ui and U2^ By Theorem 2.6, 



U1U2 



X'uXj^ is an edge of G{M) (i.e. it is not a fill edge) 



because /f is a minimal triangulation of G(M), so all fill 
edges come from saturating each S g A//. Therefore there 
is a taxon t' of M such that x'H^O = Ji and x'H^O = h- 
As in the unique vertex case, z is the unique node with 
label t\ 

Therefore any node of T with degree at most two is 
labeled by a unique taxon, implying there are at most n 
such nodes. Any tree containing at most n leaves and 
internal nodes of degree two has at most 2n — 3 edges. 
Hence T has at most 2n — 3 edges, and in turn H has at 
most 2n — 3 minimal separators, proving the bound. □ 

Remark. The proof of Lemma 4.8 requires minimal- 
ity of the triangulation, but it does not require that M 
lacks missing values or that the number of states for each 
character is bounded. 



This Lemma along with the fact that each S g 
has fewer than m vertices gives the following result. 



A* 



Lemma 4.9. Suppose that H is a proper minimal tri- 
angulation of G(M) obtained by saturating a maximal 
pairwise parallel legal set of minimal separators Q. Then 
H has 0(n) minimal separators, 0(m) vertices, and 0{m^) 
edges. Furthermore, H can be calculated in 0(nm^) time. 

Proof, The minimal separator bound follows from 
Lemma 4.8, and the vertex and edges bounds follow from 
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Observation 4.2 and the fact that H and G{M) have the 
same vertex set. In order to calculate //, we must cal- 
culate the fill edge set E(H) - £(G(M)). Recall that, by 
Theorem 2.6, the fill edges of // are obtained by saturating 
each minimal separator in Q. Each S e Q has fewer than 
m vertices by Lemma 4.1 and |Q| = 0{n) by Lemma 4.8. 
It is straightforward to check for each S e Q and each pair 
u,v e S iiuv defines a fill edge with an amortized running 
time of OinnP-), □ 

In [23], Tarjan and Yannakakis developed Maximum 
Cardinality Search (MCS), which recognizes chordal 
graphs in linear time. Blair and Peyton [11] showed how 
MCS can be used to construct a clique tree for a chordal 
graph while retaining the linear time bound. 

Lemma 4.10. [11] Let G bea chordal graph. Then Max- 
imum Cardinality Search (MCS) can be implemented to 
produce a clique tree T ofG with running time 0{\V{G) \ + 
|£(G)|). 

Combining these lemmas show that our minimal sep- 
arator algorithm for constructing perfect phylogenies for 
r = 3 is competitive with the algorithm of Dress and Steel 
[1], giving our main result. 

Theorem 4.11. The algorithm Proper Triangulation for 
3-State Characters runs in 0(nm^) time. 

Proof. The first step can be implemented in 0(m^) time 
as follows. Each pair of characters has a partition intersec- 
tion graph with at most six vertices, and it is straightfor- 
ward to check for cycles. There are O(m^) such pairs of 
characters. Lemma 4.3 states that step two takes 0(nm^) 
time. For the third and fourth step, we first compute the 
connected components of G(M) — S for each S g A^^^^^. 
Lemmas 4.3 and 4.4 tell us there are 0{m) computations 
that require 0(nm) time, so computing all the sets of con- 
nected components takes 0(nm^) time. There are 0(m) 
monochromatic pairs (three pairs per character), and for 



each monochromatic pair x/^ > xl^ check the connected 
components of each S g A^^^^ and ensure at least one of 
these minimal separators is a (x-^, X^)'^^?^^^^^^- Hence 
step three takes O(m^) time. Lemma 4.7 shows that step 
four has a running time of 0(nm^). Step five runs in 
0(nm) time due to the bounds in Lemmas 4.3 and 4.8. 
That is, after picking a minimal separator S to be in Q, 
there are 0(m) minimal separators that can cross S and we 
repeat this process 0(n) times to construct Q. Construct- 
ing Gq was shown to take 0{nm^) time in Lemma 4.9. 
Lemma 4.9 shows that 0(| V(Gq) \ + \E{Gq)\) = 0{m^) so 
using MCS in step 7 takes 0{m^) time. Hence each step 
and the pre-processing for each step takes at most 0(nm^) 
time, so the algorithm takes at most 0(nm^) time. □ 

Large partition intersection graphs 

Ideally, one would like to find an 0(n^m) or 0{nm) algo- 
rithm for 3-state perfect phylogeny (i.e., m is square-free). 
In this section, we will construct a family of 3-state matri- 
ces M that have a perfect phylogeny and B(m^) edges 
in G{M). This discourages attempts to improve our time 
bound using an approach that explicitly computes the 
partition intersection graph. 

Any 3-state character compatible with a perfect phy- 
logeny can be obtained from choosing any two edges 
of the phylogeny, removing them, and using the three 
resulting subtrees to define each taxon s state for that char- 
acter. 2-state characters are obtained in a similar manner, 
removing a single edge instead of two edges. Therefore, if 
a 3-state matrix M with distinct columns (up to relabeling) 
has a perfect phylogeny, m = 0({^) = 0{n^). 

Consider the tree T with taxa ti,t2, . . . ,tn as depicted in 
Figure 5, and suppose / < We construct the character 
X^''^^ using the partition {ti, ^2, • • • , U), ^/+2, • • • , tj}, 
{tj^i, tj-^2> . . . , ^fz} as in Figure 5. Each set in the partition is 
called the cell 0, cell 1, and cell 2 of x ^'"^\ respectively. That 
is, X^^'^Hh) = 0, x^''^\ti^i) = h X^''^\tj+i) = 2, and so on. 
Let M* be the matrix whose columns are the characters 
X ^'"^^ for 1 < i < j < n. T is clearly a perfect phylogeny for 



Xo 



X2 



Figure 5 Characters of a perfect phylogeny with a large partition intersection graph. A 3-state character created using intervals of taxa from 
a fully resolved tree 7. The 0^^ piece of x*^''-^-* is the interval Xq^''- 
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M*, and m = (^2^) = G(n^), Next, we show that G(M*) 
has edges. 

Observation 5.1. Le^ x ^^'^^ X ^ distinct charac- 
ters ofM"", Then X/f X/|f '^'^ ^'^ e^^e o/G(M*) /^ce// /c 0/ 
X ^^'^^ <2n<i ^/ze ce// /c^ ofx ^ Z^^^ve a non-empty intersection 
(i.e. share a taxon). 

For example, the cell 1 of x^^'^^ and cell 1 x^^'^^ share 
taxon ts so Xi^'^^Xi^'^^ is an edge in G(M*). In contrast, 
cell 0 of X ^^'^^ and cell 1 of x ^^'^^ do not share any taxa, 
so Xo^'^^Xi^'^^ is not an edge in G(M*). Consider the char- 
acters x^^'^^ and x^^''^'^ for distinct /, and /. There are 
at least (^) pairs of these characters, and each such pair 
provides at least one edge to G(M*) because both cell 0 of 
X^''^^ and cell 0 of x^' '^'^ share ti. Therefore G(M*) has at 
least o(n^) = o{n?) edges. There are at most (^) edges 
in any partition intersection graph, so G(M*) has 
edges, and reading each entry of M to compute G{M) 
requires at least nm time. Hence any construction algo- 
rithm that explicitly computes the partition intersection 
graph requires at least 0{nm + m^) time. 

Conclusions 

We have demonstrated how to use the minimal separa- 
tor approach introduced in [14] to construct a perfect 
phylogeny for 3-state data in 0(nm^) time. We also con- 
structed a 3-state matrix M with a perfect phylogeny that 
has B(m^) edges. Thus, any explicit analysis of the edges 
of G{M) or of a proper triangulation of G{M) is inadequate 
to speed up our approach. Faster proper triangulation 
algorithms should use M for computation instead of G{M) 
aided with theoretical results about G(M). Constructing 
tree representations in order to minimally triangulate a 
graph without explicitly computing the fill edges was stud- 
ied in [19] in order to achieve a faster time bound, and it 
would be interesting to see if these ideas can be extended 
to find a faster construction algorithm for 3-state perfect 
phylogeny. 
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